This script estimates the most likely parameters for spatial (using Kernel Density Estimation - KDE) memory with a temporal decay (negative exponential) component. Firstly we use the amt package to estimate a KDE bandwidth (which is the standard deviation when using Gaussian kernels), using the ‘reference’ bandwidth. To get a population-level estimate for fitting a hierarchical model we use the mean between individuals. For the temporal decay component, we are trying to estimate a negative exponential rate parameter that reduces the influence of previous locations the further they are in the past. There is a function to estimate a temporal decay value for some given parameters, which can be optimised using maximum likelihood for each individual animal, and then a function to estimate a population-level temporal decay parameter. After estimating the temporal decay parameter(s), there is a function to estimate the previous space use density for all used and randomly sampled steps, using the estimated KDE bandwidth and temporal decay parameter(s), which is used in the step selection model fitting.

Load packages


options(scipen=999)

library(tidyverse)
packages <- c("amt", "terra", "tictoc", "matrixStats", "beepr", "ks", "viridis")
walk(packages, require, character.only = T)

Import buffalo data

These data include the random steps and covariates. The random steps are included in this dataset so that the spatiotemporal memory density can be estimated at every used and random step, but they are not used to estimate the bandwidth or temporal decay parameters.


buffalo_data_rand_steps <- 
  read_csv("outputs/buffalo_parametric_popn_covs_GvM_10rs_2024-02-18.csv")
## Rows: 1165406 Columns: 22
## ── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (18): id, burst_, x1_, x2_, y1_, y2_, sl_, ta_, dt_, hour_t2, step_id_, y, ndvi_temporal, veg_herby, canopy_co...
## lgl   (1): case_
## dttm  (3): t1_, t2_, t2_rounded
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# to ensure that the time is in the right timezone
attr(buffalo_data_rand_steps$t1_, "tzone") <- "Australia/Queensland"
attr(buffalo_data_rand_steps$t2_, "tzone") <- "Australia/Queensland"
attr(buffalo_data_rand_steps$t2_rounded, "tzone") <- "Australia/Queensland"

Check the GPS data through time to ensure that there are no large gaps in the data, and to identify individuals that have poor data quality.


# check the timeline of GPS data across individuals
buffalo_data_rand_steps %>% ggplot(aes(x = t1_, y = factor(id), 
                                       colour = factor(id))) +
  geom_point(alpha = 0.1) +
  scale_y_discrete("Buffalo ID") +
  scale_x_datetime("Date") +
  scale_colour_viridis_d() +
  theme_bw() +
  theme(legend.position = "none")


# individuals to keep - remove 2354 due to poor data quality in the first couple months
buffalo_ids <- c(2005, 2014, 2018, 2021, 2022, 2024, 2039, 
                 2154, 2158, 2223, 2327, 2387, 2393) # 2354, 
# remove individuals not in the id vector
buffalo_data_rand_steps <- buffalo_data_rand_steps %>% 
  filter(id %in% buffalo_ids)

buffalo_data_rand_steps %>% filter(y == 1) %>% 
  ggplot(aes(x = t1_, y = factor(id), colour = factor(id))) +
  geom_point(alpha = 0.05) +
  scale_y_discrete("Buffalo ID") +
  scale_x_datetime("Date") +
  scale_colour_viridis_d() +
  theme_bw() +
  theme(legend.position = "none")

Preparing data for KDE estimation


# convert to track object to use the amt package for KDE estimation
buffalo_data_pres_track <- buffalo_data_rand_steps %>% 
  filter(y == 1 & id %in% buffalo_ids) %>% 
  mk_track(id = id, x1_, y1_, t1_, order_by_ts = T, all_cols = T, crs = 3112) %>% 
  arrange(id)

head(buffalo_data_pres_track)
## # A tibble: 6 × 22
##       x_        y_ t_                     id burst_    x2_       y2_   sl_     ta_ t2_                   dt_
## *  <dbl>     <dbl> <dttm>              <dbl>  <dbl>  <dbl>     <dbl> <dbl>   <dbl> <dttm>              <dbl>
## 1 41968. -1435673. 2018-07-25 11:04:23  2005      1 41921. -1435656.  50.7  1.37   2018-07-25 12:04:39  60.3
## 2 41921. -1435656. 2018-07-25 12:04:39  2005      1 41778. -1435602. 152.  -0.0214 2018-07-25 13:04:17  59.6
## 3 41778. -1435602. 2018-07-25 13:04:17  2005      1 41840. -1435637.  70.7  2.99   2018-07-25 14:04:39  60.4
## 4 41840. -1435637. 2018-07-25 14:04:39  2005      1 41655. -1435606. 188.  -2.80   2018-07-25 15:04:27  59.8
## 5 41655. -1435606. 2018-07-25 15:04:27  2005      1 41618. -1435610.  37.1  0.285  2018-07-25 16:04:24  59.9
## 6 41618. -1435610. 2018-07-25 16:04:24  2005      1 41687. -1436127. 522.   1.58   2018-07-25 17:04:28  60.1
## # ℹ 11 more variables: t2_rounded <dttm>, hour_t2 <dbl>, case_ <lgl>, step_id_ <dbl>, y <dbl>, ndvi_temporal <dbl>,
## #   veg_herby <dbl>, canopy_cover <dbl>, slope <dbl>, cos_ta_ <dbl>, log_sl_ <dbl>

Estimate the KDE kernel bandwidth

Initially we were estimating the kernel bandwidth (sd parameter) and the temporal decay component concurrently, based on the parameters that maximised the next step density. This is an interesting approach, and may be useful for inferring the ‘strength’ of memory, and how that differs between individuals, but as there is an additional inference process - the step selection model fitting, we thought it would be best to keep the procedure for the estimating the bandwidth simple. Here we assess the kernel bandwidth estimated by several methods, which is constrained to be the same in the x and the y direction, features of landscape can produce asymmetry, although we did not want to impose asymmetry when predicting in novel areas.

We are using KDE rather than a method that considers autocorrelation, such as AKDE, as when generating simulated trajectories, the density needs to be updated at every time step, and evaluated at each proposed step. Calculating densities using KDE with normal kernels is very fast, and is straightforward to include in the simulation model, as we can use the dnorm function with a vector of x (and y) locations, rather than estimating the AKDE (or similar) and incorporating as a spatial layer.


# change plotting to display four base plots at once
par(mfrow = c(2, 2))

buffer <- 5000
res <- 25

bandwidth_ref <- vector(mode = "list", length = length(buffalo_ids))
bandwidth_ref_vector <- c()
bandwidth_ref_hr <- vector(mode = "list", length = length(buffalo_ids))

bandwidth_pi <- vector(mode = "list", length = length(buffalo_ids))
bandwidth_pi_vector <- c()
bandwidth_pi_hr <- vector(mode = "list", length = length(buffalo_ids))

# bandwidth_lscv <- vector(mode = "list", length = length(buffalo_ids))
# bandwidth_lscv_vector <- c()
# bandwidth_lscv_hr <- vector(mode = "list", length = length(buffalo_ids))


tic(msg = "Total time for 13 individuals")

for(i in 1:length(buffalo_ids)) {

  # subset by buffalo id
  buffalo_data_id <- buffalo_data_pres_track %>% filter(y == 1 & id == buffalo_ids[i])
    
  # create template raster
  # create extent of the raster
  xmin <- min(buffalo_data_id$x2_) - buffer
  xmax <- max(buffalo_data_id$x2_) + buffer
  ymin <- min(buffalo_data_id$y2_) - buffer
  ymax <- max(buffalo_data_id$y2_) + buffer
  template_raster <- rast(xmin = xmin, xmax = xmax, 
                          ymin = ymin, ymax = ymax, 
                          res = res, crs = crs("epsg:3112"))
  
  
  # reference bandwidth
  tic(msg = "Reference bandwidth")
  bandwidth_ref[[i]] <- hr_kde_ref(buffalo_data_id)
  toc()
  print(bandwidth_ref[[i]])
  bandwidth_ref_vector[i] <- bandwidth_ref[[i]][1]
  
  
  bandwidth_ref_hr[[i]] <- hr_kde(buffalo_data_id, 
                           h = bandwidth_ref[[i]],
                           trast = template_raster, 
                           levels = c(0.5, 0.75, 0.95))
  
  plot(bandwidth_ref_hr[[i]]$ud, 
       main = paste0("Reference bandwidth, Buffalo ", buffalo_ids[i]))
  
  
  # plug-in bandwidth
  tic(msg = "Plug-in bandwidth")
  bandwidth_pi[[i]] <- hr_kde_pi(buffalo_data_id)
  toc()
  print(bandwidth_pi[[i]])
  bandwidth_pi_vector[i] <- bandwidth_pi[[i]][1]

  bandwidth_pi_hr[[i]] <- hr_kde(buffalo_data_id,
                           h = bandwidth_pi[[i]],
                           trast = template_raster, 
                           levels = c(0.5, 0.75, 0.95))
  
  plot(bandwidth_pi_hr[[i]]$ud, 
       main = paste0("Plug-in bandwidth, Buffalo ", buffalo_ids[i]))
  
  
  # least-squares cross validation bandwidth
  # estimate bandwidth using least-squares cross validation
  # tic(msg = "LSCV bandwidth")
  # bandwidth_lscv[[i]] <- hr_kde_lscv(buffalo_data_id, 
  #                                    trast = template_raster, which_min = "local")
  # toc()
  # bandwidth_lscv_vector[i] <- bandwidth_lscv[[i]][1]
  # 
  # bandwidth_lscv_hr[[i]] <- hr_kde(buffalo_data_id,
  #                          h = bandwidth_lscv[[i]],
  #                          trast = template_raster, levels = c(0.5, 0.75, 0.95))
  # plot(bandwidth_lscv_hr[[i]]$ud, 
  #      main = paste0("LSCV bandwidth, Buffalo ", buffalo_ids[i]))
  
}
## Reference bandwidth: 0 sec elapsed
## [1] 845.9741 845.9741
## Plug-in bandwidth: 0.03 sec elapsed
## [1] 85.79459 25.29613
## Reference bandwidth: 0 sec elapsed
## [1] 870.951 870.951
## Plug-in bandwidth: 0.02 sec elapsed
## [1] 80.31529 57.90325

## Reference bandwidth: 0.02 sec elapsed
## [1] 514.6824 514.6824
## Plug-in bandwidth: 0.01 sec elapsed
## [1] 51.91688 54.90821
## Reference bandwidth: 0 sec elapsed
## [1] 451.5416 451.5416
## Plug-in bandwidth: 0.02 sec elapsed
## [1] 46.21448 43.85092

## Reference bandwidth: 0 sec elapsed
## [1] 490.8895 490.8895
## Plug-in bandwidth: 0 sec elapsed
## [1] 58.24006 49.54056
## Reference bandwidth: 0 sec elapsed
## [1] 677.4343 677.4343
## Plug-in bandwidth: 0.01 sec elapsed
## [1] 25.01523 60.02277

## Reference bandwidth: 0 sec elapsed
## [1] 547.5389 547.5389
## Plug-in bandwidth: 0.02 sec elapsed
## [1] 65.66955 27.37679
## Reference bandwidth: 0 sec elapsed
## [1] 172.9515 172.9515
## Plug-in bandwidth: 0 sec elapsed
## [1] 24.84565 25.78731

## Reference bandwidth: 0 sec elapsed
## [1] 957.0183 957.0183
## Plug-in bandwidth: 0 sec elapsed
## [1] 76.14371 48.52842
## Reference bandwidth: 0 sec elapsed
## [1] 601.7503 601.7503
## Plug-in bandwidth: 0.02 sec elapsed
## [1] 58.74228 98.44715

## Reference bandwidth: 0 sec elapsed
## [1] 406.984 406.984
## Plug-in bandwidth: 0 sec elapsed
## [1] 23.48089 38.85835
## Reference bandwidth: 0 sec elapsed
## [1] 473.602 473.602
## Plug-in bandwidth: 0.03 sec elapsed
## [1] 38.86717 20.80289

## Reference bandwidth: 0 sec elapsed
## [1] 420.2645 420.2645
## Plug-in bandwidth: 0.04 sec elapsed
## [1] 42.22782 73.21382

toc()
## Total time for 13 individuals: 73.5 sec elapsed

Use the mean bandwidth for population level analysis

It’s clear that the bandwidths estimated by the different methods are very different. As we are predicting from these models, we opted for the reference bandwidth that undersmooths the space use, describing a broad-familiarity with an area, rather than the plug-in bandwidth that has many small discrete modes that conflate with the habitat in these areas. We did not run simulations with the plug-in bandwidth to test our assumptions however.


hist(bandwidth_ref_vector, breaks = 10)

hist(bandwidth_pi_vector, breaks = 10)


mean_kde_sd <- mean(bandwidth_ref_vector)
mean_kde_sd
## [1] 571.6602

Plotting the estimated spatial sd parameters


range <- 3500 # for the x-axis - tune as needed

spatial_sds <- data.frame(-range:range, 
                          sapply(c(bandwidth_ref_vector, mean_kde_sd), 
                                 function(sd) dnorm(-range:range, mean = 0, sd = sd)))
colnames(spatial_sds) <- c("x", buffalo_ids, "mean")

spatial_sds_long <- pivot_longer(spatial_sds, cols = !1, 
                                 names_to = "id", values_to = "value")

# Create color mapping
unique_groups <- unique(spatial_sds_long$id)
colors <- viridis(length(unique_groups))
names(colors) <- unique_groups
colors["mean"] <- "red"


ggplot(spatial_sds_long) +
  geom_line(aes(x = -x, y = value, colour = id), size = 1) +
  scale_x_continuous("Metres from location", 
                     breaks = seq(-3000, 3000, 1000)) +
  scale_y_continuous("Density") +
  # scale_colour_viridis_d("ID") +
  scale_colour_manual("ID", values = colors) +
  ggtitle("Estimated KDE bandwidth (Gaussian SD)") +
  theme_classic() +
  theme(legend.position = "right",
        legend.key.height = unit(0.5, "cm"))


ggsave(paste0("outputs/plots/manuscript_figs/spatial_KDE_sd_byID_", 
              Sys.Date(), ".png"),
       width=150, height=90, units="mm", dpi = 600)

ggplot(spatial_sds_long %>% filter(!id == 2154)) +
  geom_line(aes(x = -x, y = value, colour = id), size = 1) +
  scale_x_continuous("Metres from location", 
                     breaks = seq(-3000, 3000, 1000)) +
  scale_y_continuous("Density") +
  # scale_colour_viridis_d("ID") +
  scale_colour_manual("ID", values = colors) +
  ggtitle("Estimated KDE bandwidth (Gaussian SD)") +
  theme_classic() +
  theme(legend.position = "right",
        legend.key.height = unit(0.5, "cm"))


ggsave(paste0("outputs/plots/manuscript_figs/spatial_KDE_sd_byID_no2154_",
              Sys.Date(), ".png"),
       width=150, height=90, units="mm", dpi = 600)

Estimating the temporal decay component

To incorporate decaying memory into model fitting and predictions, we used a temporally decaying intensity of previous space use approach by combining kernel density estimation with weights that exponentially decay the further they are in the past, representing a gradual forgetting process. For our approach, the density \(f\) at the current location \(s\) and time \(t\) is defined by

\[ f(s|t) = \frac{\sum_{i=1}^{n} \exp^{-\gamma(t-t_i)} K_h(x - x_i) K_h(y - y_i)}{\sum_{i=1}^{n} \exp^{-\gamma(t-t_i)}}. \]

where \(\gamma\) defines the strength of temporal decay, \(t_i\) is the time of previous locations, \(K_h\) is the kernel function \(\sim \mathcal{N}(\mu = 0, \sigma)\), where the \(x\) distance is determined by the current location \(x\) and the previous locations \(x_i\), which is the same in the \(y\) direction. For numerical stability, we used the log-sum-exp trick to sum over the densities relating to each previous location in the memory period. Similarly to Rheault2021-od, we excluded locations from the past 24 hours, as these locations reflect the autocorrelation in the movement process rather than a memory process.

We need to set a ‘memory period’ to define the time window over which the memory process is considered. The memory period should cover the duration that the memory decays to nearly 0.

This function subsets the memory period and memory delay by the number of locations, rather than by the time. For temporally consistent data this function will work the same as the function below, which subsets by time.

Previous space use density function - subsetting by number of locations


density_space_time_loc_subset <- function(locations_x, 
                                          locations_y,
                                          locations_time,
                                          spatial_sd,
                                          gamma_param,
                                          # in the number of locations
                                          memory_period_locs, 
                                          # number of location to exclude
                                          memory_delay_locs
                                          ) {
  
  # create object for log_likelihood
  log_likelihood <- 0
  
  # start from the first location AFTER a duration of the memory period
  for(i in 1:(length(locations_x)-memory_period_locs)) {
    
    # this subsets the locations from the start to the end of the memory period 
    # (until the 24 hour memory_delay) for x and y coords and time
    
    # difference between locations in the x direction
    diff_x <- locations_x[i+memory_period_locs] - 
      locations_x[i:(i+memory_period_locs-memory_delay_locs)]
    
    # difference between locations in the y direction
    diff_y <- locations_y[i+memory_period_locs] - 
      locations_y[i:(i+memory_period_locs-memory_delay_locs)]
    
    # difference between locations in time
    diff_time <- as.numeric(
      difftime(locations_time[i+memory_period_locs], 
               locations_time[i:(i+memory_period_locs-memory_delay_locs)], 
               units = "hours"))
    
    num_locs <- length(diff_x)
    
    # as all the parameters are on the log scale, then we can simply add them together to get the probability density for a given SD parameter and exponential decay parameter
    # we use the normal density function, where the density is defined by the distance (in the x or y direction, separately) and the SD parameter
    # the temporal decay component is already o
    log_joint_density <- dnorm(diff_x, mean = 0, sd = spatial_sd, log = TRUE) + 
      dnorm(diff_y, mean = 0, sd = spatial_sd, log = TRUE) + 
      (-gamma_param*diff_time)
    
    # we subtract the sum of the log temporal decay component, which is equivalent to dividing by the sum of the exponential components to normalise
    # we are also normalising by the number of locations - this isn't essential as they all should be the same length, but in the hour-subset version the number of locations will differ, and this will need to be included, so we have been consistent and included it here
    log_likelihood <- log_likelihood + 
      logSumExp(log_joint_density) - 
      log(num_locs) - 
      logSumExp(-gamma_param*diff_time)
    
  }
  return(-log_likelihood)
}

Testing the MLE optimisation function for a single individual

This function outputs a value of the negative log-likelihood, which will be minimised during the optimisation.

Here we are using mean KDE bandwidth across individuals, although we could also use that individual’s bandwidth by indexing through the bandwidth_ref_vector object.

Subsetting by the number of locations


memory_period_locs <- 1000 # locations
memory_delay_locs <- 24 # locations

# buffalo id
i = 1

# picking a single buffalo from the list of ids, and selecting only the used points
buffalo_used <- buffalo_data_pres_track %>% 
  filter(id == buffalo_ids[i] & y == 1)

tic(msg = "Single likelihood calculation")

density_space_time_loc_subset(
  # pull out a vector of x coordinates
  locations_x = buffalo_used$x_, 
  # pull out a vector of y coordinates
  locations_y = buffalo_used$y_, 
  # pull out a vector of times
  locations_time = buffalo_used$t_, 
  # mean bandwidth across individuals
  spatial_sd = mean_kde_sd, 
  # test gamma parameter
  gamma_param = 0.01,
  # Around 500-1000 locations is sufficient in our case for the memory to decay to near 0
  memory_period_locs = memory_period_locs, 
  # excluding the most recent 24 locations
  memory_delay_locs = memory_delay_locs) 
## [1] 207247.5

toc()
## Single likelihood calculation: 5.75 sec elapsed

Estimating the temporal decay (Gamma) parameter for a single buffalo - subsetting by the number of locations


tic(msg = "ML optimisation for a single buffalo")

space_time_param_locs <- optim(
  0.01, # starting values for the gamma parameter
  density_space_time_loc_subset, # function
  locations_x = buffalo_used$x_, # single buffalo's used locations
  locations_y = buffalo_used$y_, 
  locations_time = buffalo_used$t_,
  spatial_sd = mean_kde_sd, # mean bandwidth across individuals
  memory_period_locs = memory_period_locs, 
  memory_delay_locs = memory_delay_locs,
  method = "L-BFGS-B", # this method allows for multiple parameters and box constraints
  lower = 0, upper = 1) # box constraints for the gamma parameter

toc()
## ML optimisation for a single buffalo: 176.39 sec elapsed

space_time_param_locs
## $par
## [1] 0.009754395
## 
## $value
## [1] 207246.6
## 
## $counts
## function gradient 
##       16       16 
## 
## $convergence
## [1] 0
## 
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
beep(sound = 2)

Plotting the exponential decay component


# time in the past 
past_time <- 0:memory_period_locs
temporal_decay_weight <- exp(-space_time_param_locs$par[1]*(past_time))

ggplot() +
  geom_line(data = data.frame(past_time, temporal_decay_weight),
            aes(x = past_time, y = temporal_decay_weight)) +
  scale_x_reverse("Number of locations in the past") +
  labs(y = "Weight", title = "Previous space use density temporal decay") +
  theme_bw()

Looping over all individuals to get a temporal decay parameter for each individual, with each individual’s bandwidth

This will take a while to run, so we are therefore loading the file when knitting the document. We also had memory issues when trying to run this function when knitting the document, which worked fine when running the code in the console.


# create an exmpty vector to store the estimated temporal decay parameters
exp_gamma_params_locs_subset <- c()

if(file.exists("outputs/temporal_decay_param_locs_list.rds")) {
  
  temporal_decay_param_locs_list <- 
    readRDS("outputs/temporal_decay_param_locs_list.rds")
  temporal_decay_param_locs_list
  
  # create a vector of the estimated gamma parameters
  for(j in 1:length(buffalo_ids)) {
    exp_gamma_params_locs_subset[j] <- temporal_decay_param_locs_list[[j]]$par
  }
  
  exp_gamma_params_locs_subset
  
  
} else {
  
  
  # create a list to store the temporal decay parameters
  temporal_decay_param_locs_list <- vector(mode = "list", 
                                           length = length(buffalo_ids))
  
  for(j in 1:length(buffalo_ids)) {
    
    tic("Each individual ML optimisation")
    # select the individual - ensure this is only USED points, and not randomly sampled as well
    buffalo_used <- buffalo_data_pres_track %>% filter(id == buffalo_ids[j])
  
    temporal_decay_param_locs_list[[j]] <- optim(
      0.01, # starting values for the gamma parameter
      density_space_time_loc_subset, # function
      locations_x = buffalo_used$x_, # single buffalo's used locations
      locations_y = buffalo_used$y_,
      locations_time = buffalo_used$t_,
      spatial_sd = mean_kde_sd, # value for the spatial sd (KDE bandwidth) parameter
      memory_period_locs = memory_period_locs,
      memory_delay_locs = memory_delay_locs,
      method = "L-BFGS-B", # this method allows for multiple parameters and box constraints
      lower = 0, upper = 1) # box constraints for the gamma parameter
  
    print(temporal_decay_param_locs_list[[j]])
    exp_gamma_params_locs_subset[j] <- temporal_decay_param_locs_list[[j]]$par
    
    toc()
    
  }
  
  saveRDS(temporal_decay_param_locs_list, 
          file = "outputs/temporal_decay_param_locs_list.rds")
  
  beep(sound = 2)

}
##  [1] 0.009754395 0.006238564 0.005487378 0.009878391 0.005482380 0.009604574 0.007778278 0.006783557 0.010025430
## [10] 0.005120016 0.011403410 0.007191531 0.004365455

Extracting parameters into data frame


memory_params <- data.frame("id" = buffalo_ids, 
                            "kde_sd" = bandwidth_ref_vector, 
                            "temporal_decay" = exp_gamma_params_locs_subset)

write_csv(memory_params, 
          paste0("outputs/memory_params_KDE_exp_decay_", Sys.Date(), ".csv"))

For population-level estimation of the temporal decay parameter

Previously we have estimated the temporal decay (Gamma) parameter for a single buffalo. Now we will estimate the temporal decay parameter for all the individuals at once, using the mean_kde_sd as the spatial decay parameter.

To achieve this we just change the memory process function to have locations_x, locations_y and locations_time in lists that are iterated over in a loop. Here we are changing the function that subsetted by the number of locations, although changing the function that subsetted by the number of hours would be equivalent.

This is the function that we used to estimate the temporal decay parameter of the simulations in the paper.


density_space_time_ALL_locs_subset <- function(
    # this time we will pass in a list of dataframes (one for each buffalo), 
    # and we will parse them out in the function
    data_list, 
    spatial_sd,
    gamma_param,
    # the memory period (in locations) should cover the duration that the memory decays to nearly 0. 
    memory_period_locs, 
    # number of locations to exclude
    memory_delay_locs 
    ) {
  
  # create object for log_likelihood
  log_likelihood <- 0
  
  n = length(data_list)
  
  for(j in 1:n)
  {
    # index j corresponds to individuals
    # index each dataset from the list, and then extract the relevant vector
    
    this_locations_x <- data_list[[j]]$x_
    this_locations_y <- data_list[[j]]$y_
    this_locations_time <- data_list[[j]]$t_
    
    for(i in 1:(length(this_locations_x)-memory_period_locs))
    {
        # spatial mixture density component
        # this subsets the locations from the start to the end (until the 24 hour delay) 
      # of the memory period for x and y coords and time
      
        # difference between locations in the x direction
        diff_x <- this_locations_x[i+memory_period_locs] - 
          this_locations_x[i:(i+memory_period_locs-memory_delay_locs)]
        
        # difference between locations in the y direction
        diff_y <- this_locations_y[i+memory_period_locs] - 
          this_locations_y[i:(i+memory_period_locs-memory_delay_locs)]
        
        # difference between locations in time
        diff_time <- as.numeric(
          difftime(this_locations_time[i+memory_period_locs], 
                   this_locations_time[i:(i+memory_period_locs-memory_delay_locs)], 
                   units = "hours"))
        
        n_locs <- length(diff_x)
    
        # as all the parameters are on the log scale, then we can simply add them 
        # together to get the probability density for a given SD parameter 
        # and exponential decay parameter
        
        # we use the normal density function, 
        # where the density is defined by the distance 
        # (in the x or y direction, separately) and the SD parameter
        # the temporal decay component is already o
        log_joint_density <- dnorm(diff_x, mean = 0, sd = spatial_sd, log = TRUE) + 
            dnorm(diff_y, mean = 0, sd = spatial_sd, log = TRUE) + 
            (-gamma_param*diff_time)
    
        # we subtract the sum of the log temporal decay component, 
        # which is equivalent to dividing by the sum of the exponential components to normalise, 
        # and normalise by the number of locations (which should be the same, 
        # but when subsetting by hours in the functions below this will not be the case)
        log_likelihood <- log_likelihood + 
          logSumExp(log_joint_density) - 
          log(n_locs) - 
          logSumExp(-gamma_param*diff_time)
    }
  }
  
  return(-log_likelihood)
  
}

Running the optimisation


if(file.exists("outputs/optim_space_time_Gamma_param_ALLoptim.rds")) {
  
  space_time_ALLoptim <- readRDS("outputs/optim_space_time_Gamma_param_ALLoptim.rds")
  space_time_ALLoptim
  
  
} else {
  

  # split the date into a list of dataframes, one for each individual
  buffalo_data_pres_list <- split(x = buffalo_data_pres_track, 
                                  f = buffalo_data_pres_track$id)
    
  tic(msg = "Optimising over all individuals simulatenously")
  
  space_time_ALLoptim <- optim(
    0.01, # starting values for the gamma parameter
    density_space_time_ALL_locs_subset, # function
    data_list = buffalo_data_pres_list, # list of dataframes
    spatial_sd = mean_kde_sd, # value for the spatial sd (KDE bandwidth) parameter
    memory_period_locs = 1000, # number of locations to include in the memory period
    memory_delay_locs = 24, # number of locations to exclude from the memory period
    method = "L-BFGS-B", # this method allows for multiple parameters and box constraints
    lower = 0, upper = 1) # box constraints for the gamma parameter
  
  print(space_time_ALLoptim)
  toc()
  
  # should be just a single parameter for the gamma parameter
  saveRDS(space_time_ALLoptim, 
          file = "outputs/optim_space_time_Gamma_param_ALLoptim.rds")
  
  beep(sound = 2)

}
## $par
## [1] 0.007366395
## 
## $value
## [1] 2019377
## 
## $counts
## function gradient 
##       24       24 
## 
## $convergence
## [1] 0
## 
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

Writing ALLoptim (optimisation of all individuals at once) memory parameters to csv


# extracting the temporal decay parameter
exp_gamma_ALLoptim <- space_time_ALLoptim$par
exp_gamma_ALLoptim
## [1] 0.007366395

# mean KDE parameters
mean_kde_sd
## [1] 571.6602

# create dataframe and write to file - this dataframe will be read by the simulation scripts
memory_params_ALLoptim <- data.frame(exp_gamma_ALLoptim, mean_kde_sd)
write.csv(memory_params_ALLoptim, 
          file = paste0("outputs/memory_params_ALLoptim_", Sys.Date(), ".csv"))

Plotting the optimised temporal decay parameter(s)


memory_decay <- data.frame(0:memory_period_locs, 
                           sapply(c(exp_gamma_params_locs_subset, exp_gamma_ALLoptim), 
                                  function(gamma) exp(-gamma*(0:memory_period_locs))))
# add names to the dataframe
colnames(memory_decay) <- c("x", buffalo_ids, "all")

# prepare for plotting with ggplot
memory_decay_long <- pivot_longer(memory_decay, cols = !1, 
                                  names_to = "id", values_to = "value")

# Create color mapping
unique_groups <- unique(spatial_sds_long$id)
colors <- viridis(length(unique_groups))
names(colors) <- unique_groups
colors["all"] <- "red"


ggplot(memory_decay_long) + #  %>% filter(name != "2154")
  geom_line(aes(x = -x, y = value, colour = id), size = 1) +
  scale_x_continuous("Hours (past)") +
  scale_y_continuous("Density") +
  # scale_colour_viridis_d("ID") +
  scale_colour_manual("ID", values = colors) +
  ggtitle("Estimated memory decay function") +
  theme_classic() +
  theme(legend.position = "right",
        legend.key.height = unit(0.5, "cm"))


ggsave(paste0("outputs/plots/manuscript_figs/",
              "temporal_decay_byID_ALLoptim_loc_subset_", 
              Sys.Date(), ".png"),
       width=150, height=90, units="mm", dpi = 600)


ggplot(memory_decay_long%>% filter(id != "2154")) + 
  geom_line(aes(x = -x, y = value, colour = id), size = 1) +
  scale_x_continuous("Hours (past)") +
  scale_y_continuous("Density") +
  # scale_colour_viridis_d("ID") +
  scale_colour_manual("ID", values = colors) +
  ggtitle("Estimated memory decay function") +
  theme_classic() +
  theme(legend.position = "right",
        legend.key.height = unit(0.5, "cm"))


ggsave(paste0("outputs/plots/manuscript_figs/",
              "temporal_decay_byID_ALLoptim_loc_subset_no2154_", 
              Sys.Date(), ".png"),
       width=150, height=90, units="mm", dpi = 600)

Adding previous space use density to the used and random steps

This function subsets the previous used locations within the memory period, and calculates the previous location density for all used and random steps based on the estimated memory parameters


spatial_temporal_density_function <- function(
    data_input,
    id_val,
    memory_period, # in hours
    memory_delay, # in hours
    spatial_sd,
    temporal_decay_gamma) {
  
  # subset by individual
  # all locations
  id_all_locations <- data_input %>% filter(id == id_val)
  
  # just the used locations to estimate the density 
  # (we want to estimate the previous space use density only to used locations)
  id_used_locations <- data_input %>% filter(y == 1 & id == id_val)
  
  # empty vectors to store the previous space use density
  location_density <- c()
  
  for(i in 1:nrow(id_all_locations)){
    
    # current location
    location_x <- id_all_locations[i,]$x2_
    location_y <- id_all_locations[i,]$y2_
    location_time <- id_all_locations[i,]$t2_
    
    # memory period start and end to subset with
    memory_start_time <- location_time - as.difftime(memory_period, units = "hours")
    memory_end_time <- location_time - as.difftime(memory_delay, units = "hours")
    
    # subset locations to use to estimate previous space use density
    memory_x <- id_used_locations[id_used_locations$t2_ >= memory_start_time & 
                                    id_used_locations$t2_ <= memory_end_time, ]$x1_
    
    memory_y <- id_used_locations[id_used_locations$t2_ >= memory_start_time & 
                                    id_used_locations$t2_ <= memory_end_time, ]$y1_
    
    memory_time <- id_used_locations[id_used_locations$t2_ >= memory_start_time & 
                                       id_used_locations$t2_ <= memory_end_time, ]$t1_
    
    
    # difference in x, y and time to all points in the memory subset
    diff_x <- location_x - memory_x
    diff_y <- location_y - memory_y
    diff_time <- as.numeric(difftime(location_time, memory_time, units = "hours"))
    
    # calculate the density in relation to all the points in the memory subset
    log_joint_density <- dnorm(diff_x, mean = 0, sd = spatial_sd, log = TRUE) + 
      dnorm(diff_y, mean = 0, sd = spatial_sd, log = TRUE) + 
      (-temporal_decay_gamma*diff_time)
    
    # estimate the previous space use density for that location
    location_density[i] <- logSumExp(log_joint_density) - 
      logSumExp(-temporal_decay_gamma*diff_time)
    
  }
  
  return(location_density)
  
}

Adding the previous space use density to the data using individually estimated memory parameters

For fitting individual-level models (not recommended when fitting population-level or hierarchical models)


memory_period <- 1000 # in hours
memory_delay <- 24 # in hours

buffalo_id_memory_list <- vector(mode = "list", length = length(buffalo_ids))

for(i in 1:length(buffalo_ids)) {
  
  tic(msg = "Adding previous space use density to used and random steps")
  
   buffalo_id_memory_list[[i]] <- buffalo_data_rand_steps %>% 
     filter(id == buffalo_ids[i]) %>% 
     mutate(kde_memory_density_log = 
              spatial_temporal_density_function(
                ., 
                id_val = buffalo_ids[i],
                memory_period = memory_period,
                memory_delay = memory_delay,
                # index over the bandwidth and temporal decay parameters
                spatial_sd = memory_params$kde_sd[i],
                temporal_decay_gamma = memory_params$temporal_decay[i])) 
   toc()
}

buffalo_data_rand_steps_memory <- bind_rows(buffalo_id_memory_list)
write_csv(buffalo_data_rand_steps_memory, 
          paste0("outputs/buffalo_popn_GvM_KDEmem_ID_10rs_", Sys.Date(), ".csv"))

beep(sound = 2)

Adding the previous space use density to the data using population estimated memory parameters

The only thing that changes here is that instead of indexing over the bandwidth and temporal decay parameters there is only one of each memory parameter


buffalo_ALLoptim_memory_list <- vector(mode = "list", length = length(buffalo_ids))

for(i in 1:length(buffalo_ids)) {
  
  tic(msg = "Adding previous space use density to used and random steps")
  
   buffalo_ALLoptim_memory_list[[i]] <- buffalo_data_rand_steps %>% 
     filter(id == buffalo_ids[i]) %>% 
     mutate(kde_memory_density_log = spatial_temporal_density_function(
       ., id_val = buffalo_ids[i],
       memory_period = memory_period,
       memory_delay = memory_delay,
       # single bandwidth and temporal decay parameter for all buffalo
       spatial_sd = mean_kde_sd,
       temporal_decay_gamma = exp_gamma_ALLoptim)) 
   
   toc()
}

buffalo_data_rand_steps_memory_ALLoptim <- bind_rows(buffalo_ALLoptim_memory_list)
write_csv(buffalo_data_rand_steps_memory_ALLoptim, 
          paste0("outputs/buffalo_popn_GvM_KDEmem_allOPTIM_10rs_", Sys.Date(), ".csv"))

beep(sound = 2)

Session info


sessionInfo()
## R version 4.2.1 (2022-06-23 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_New Zealand.utf8  LC_CTYPE=English_New Zealand.utf8    LC_MONETARY=English_New Zealand.utf8
## [4] LC_NUMERIC=C                         LC_TIME=English_New Zealand.utf8    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] cowplot_1.1.1       ggExtra_0.10.1      ggh4x_0.2.6         Rfast_2.0.7         RcppZiggurat_0.1.6 
##  [6] formatR_1.14        scales_1.2.1        glmmTMB_1.1.8       clogitL1_1.5        Rcpp_1.0.10        
## [11] ecospat_3.5         TwoStepCLogit_1.2.5 survival_3.5-5      viridis_0.6.2       viridisLite_0.4.2  
## [16] matrixStats_1.0.0   patchwork_1.1.2     ggpubr_0.6.0        adehabitatHR_0.4.21 adehabitatLT_0.3.27
## [21] CircStats_0.2-6     boot_1.3-28.1       MASS_7.3-59         adehabitatMA_0.3.16 ade4_1.7-22        
## [26] sp_1.6-0            ks_1.14.0           beepr_1.3           tictoc_1.2          terra_1.7-23       
## [31] amt_0.2.1.0         lubridate_1.9.2     forcats_1.0.0       stringr_1.5.0       dplyr_1.1.2        
## [36] purrr_1.0.1         readr_2.1.4         tidyr_1.3.0         tibble_3.2.1        ggplot2_3.4.2      
## [41] tidyverse_2.0.0    
## 
## loaded via a namespace (and not attached):
##   [1] utf8_1.2.3             tidyselect_1.2.0       lme4_1.1-32            htmlwidgets_1.6.2     
##   [5] grid_4.2.1             pROC_1.18.0            munsell_0.5.0          codetools_0.2-19      
##   [9] ragg_1.2.5             units_0.8-1            miniUI_0.1.1.1         withr_2.5.0           
##  [13] audio_0.1-10           colorspace_2.1-0       highr_0.10             knitr_1.42            
##  [17] rstudioapi_0.14        ggsignif_0.6.4         Rdpack_2.4             labeling_0.4.2        
##  [21] emmeans_1.8.5          TeachingDemos_2.12     bit64_4.0.5            farver_2.1.1          
##  [25] coda_0.19-4            vctrs_0.6.2            generics_0.1.3         TH.data_1.1-2         
##  [29] circular_0.4-95        xfun_0.39              timechange_0.2.0       randomForest_4.7-1.1  
##  [33] R6_2.5.1               isoband_0.2.7          cachem_1.0.7           reshape_0.8.9         
##  [37] promises_1.2.0.1       vroom_1.6.1            multcomp_1.4-23        nnet_7.3-18           
##  [41] gtable_0.3.3           mda_0.5-3              sandwich_3.0-2         rlang_1.1.0           
##  [45] systemfonts_1.0.4      splines_4.2.1          rstatix_0.7.2          TMB_1.9.10            
##  [49] earth_5.3.2            broom_1.0.4            checkmate_2.1.0        biomod2_4.2-2         
##  [53] yaml_2.3.7             reshape2_1.4.4         abind_1.4-5            backports_1.4.1       
##  [57] httpuv_1.6.9           Hmisc_5.0-1            tools_4.2.1            nabor_0.5.0           
##  [61] ellipsis_0.3.2         raster_3.6-20          jquerylib_0.1.4        RColorBrewer_1.1-3    
##  [65] proxy_0.4-27           plyr_1.8.8             base64enc_0.1-3        classInt_0.4-9        
##  [69] rpart_4.1.19           zoo_1.8-12             cluster_2.1.4          tinytex_0.48          
##  [73] magrittr_2.0.3         data.table_1.14.8      mvtnorm_1.1-3          fitdistrplus_1.1-8    
##  [77] mime_0.12              hms_1.1.3              evaluate_0.20          xtable_1.8-4          
##  [81] mclust_6.0.0           gridExtra_2.3          compiler_4.2.1         KernSmooth_2.23-20    
##  [85] crayon_1.5.2           minqa_1.2.5            htmltools_0.5.5        later_1.3.0           
##  [89] mgcv_1.8-42            tzdb_0.3.0             Formula_1.2-5          DBI_1.1.3             
##  [93] sf_1.0-12              Matrix_1.6-5           car_3.1-2              permute_0.9-7         
##  [97] cli_3.6.1              rbibutils_2.2.13       parallel_4.2.1         pkgconfig_2.0.3       
## [101] numDeriv_2016.8-1.1    foreign_0.8-84         foreach_1.5.2          bslib_0.4.2           
## [105] estimability_1.4.1     plotmo_3.6.2           digest_0.6.31          pracma_2.4.2          
## [109] vegan_2.6-4            rmarkdown_2.21         htmlTable_2.4.1        PresenceAbsence_1.1.11
## [113] shiny_1.7.4            gtools_3.9.4           nloptr_2.0.3           lifecycle_1.0.3       
## [117] nlme_3.1-162           jsonlite_1.8.4         carData_3.0-5          fansi_1.0.4           
## [121] pillar_1.9.0           lattice_0.21-8         fastmap_1.1.1          plotrix_3.8-2         
## [125] glue_1.6.2             gbm_2.1.8.1            iterators_1.0.14       bit_4.0.5             
## [129] class_7.3-21           stringi_1.7.12         sass_0.4.5             maxnet_0.1.4          
## [133] textshaping_0.3.6      poibin_1.5             e1071_1.7-13           ape_5.7-1